First I started by reading in all of the text files. This region is the mid-west, so I wanted to choose at least 1 article from each state to get a good representation. The states included in this region are North and South Dakota, Kansas, Nebraska, Minnesota, Iowa, Missouri, Wisconsin, Illinois, Indiana, Michigan, and Ohio.
Next I made frequency dataframes for words in the article, excluding stop words.
It was interesting to see the wide range of issues that were at the top of these frequency lists. Some talked a lot about politics and government, others farmland, forrests, and migration, and some talked about floods and other extreme weather.
Then I used afinn, nrc, and bing methods to get sentiment values for each word that appeared in the frequency tables.
## # A tibble: 2,477 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # … with 2,467 more rows
## # A tibble: 13,875 × 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # … with 13,865 more rows
## # A tibble: 6,786 × 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # … with 6,776 more rows
Then I made tables of these sentiment values.
##
## negative positive
## 21 9
##
## negative positive
## 7 13
##
## negative positive
## 15 5
##
## negative positive
## 9 16
##
## negative positive
## 19 10
##
## negative positive
## 18 14
##
## negative positive
## 22 18
##
## negative positive
## 23 26
##
## negative positive
## 35 8
##
## negative positive
## 16 8
##
## negative positive
## 18 10
##
## negative positive
## 14 6
##
## negative positive
## 2 2
##
## negative positive
## 4 9
##
## negative positive
## 23 15
##
## anger anticipation disgust fear joy negative
## 11 16 6 18 11 20
## positive sadness surprise trust
## 30 9 7 22
##
## anger anticipation disgust fear joy negative
## 5 7 2 6 7 13
## positive sadness surprise trust
## 36 3 2 20
##
## anger anticipation disgust fear joy negative
## 5 10 2 10 4 17
## positive sadness surprise trust
## 20 5 4 13
##
## anger anticipation disgust fear joy negative
## 7 10 1 3 11 7
## positive sadness surprise trust
## 28 1 5 22
##
## anger anticipation disgust fear joy negative
## 10 11 1 16 7 19
## positive sadness surprise trust
## 33 12 5 13
##
## anger anticipation disgust fear joy negative
## 9 18 6 11 10 19
## positive sadness surprise trust
## 35 9 8 14
##
## anger anticipation disgust fear joy negative
## 11 22 5 11 13 20
## positive sadness surprise trust
## 57 7 4 29
##
## anger anticipation disgust fear joy negative
## 7 18 6 9 13 19
## positive sadness surprise trust
## 50 7 4 34
##
## anger anticipation disgust fear joy negative
## 16 16 17 23 6 38
## positive sadness surprise trust
## 42 18 7 30
##
## anger anticipation disgust fear joy negative
## 7 11 3 10 4 20
## positive sadness surprise trust
## 22 7 4 17
##
## anger anticipation disgust fear joy negative
## 8 12 1 14 7 19
## positive sadness surprise trust
## 34 12 5 14
##
## anger anticipation disgust fear joy negative
## 5 13 3 9 8 14
## positive sadness surprise trust
## 23 4 3 17
##
## anger anticipation disgust fear joy negative
## 3 8 1 7 3 11
## positive sadness surprise trust
## 23 7 3 14
##
## anger anticipation fear joy negative positive
## 3 10 3 6 8 28
## sadness surprise trust
## 2 1 19
##
## anger anticipation disgust fear joy negative
## 11 16 4 16 11 25
## positive sadness surprise trust
## 37 12 4 27
##
## -3 -2 -1 1 2 3
## 4 6 5 6 4 1
##
## -2 -1 1 2 3
## 4 3 9 7 1
##
## -3 -2 -1 1 2
## 3 7 4 7 1
##
## -2 1 2 3
## 3 14 12 2
##
## -3 -2 -1 1 2
## 2 10 6 4 3
##
## -3 -2 -1 1 2 3 4
## 5 12 1 7 2 1 1
##
## -3 -2 -1 1 2 3
## 2 7 7 14 5 1
##
## -3 -2 -1 1 2 3 4
## 2 7 3 11 9 1 1
##
## -3 -2 -1 1 2 3
## 5 13 9 9 7 1
##
## -3 -2 -1 1 2 3
## 1 9 4 11 2 1
##
## -3 -2 -1 1 2
## 2 10 6 5 3
##
## -3 -2 -1 1 2
## 2 3 5 5 4
##
## -3 -2 -1 1 2 3
## 1 2 4 4 2 1
##
## 1 2
## 3 9
##
## -3 -2 -1 1 2
## 3 14 4 8 5
For the bing tables, words were either given a positive or negative rating. Most of the articles I chose had slightly more negative values than positive ones. The only states with articles that had more positive word ratings were North and South Dakota, Indiana, and Michigan. And the only state with a heavy skew was Minnesota, with 35 negative and only 8 positive.
What surprised me the most about the afinn tables was Michigan. Both articles I chose from Michigan were left skewed with relatively high sentiment values compared to the other articles I chose. Another surprising point I found was that none of the words had a score higher than 3 or lower than 4. I thought that some of the more conservative states I chose from might have higher sentiment values, and the more liberal states would have much lower ones, but they almost all followed a pretty normal distribution. I saw the same pattern in the nrc tables as described with the afinn tables. I decided not to make histograms of the afinn tables since I was able to interpret from the tables themselves.
I don’t think I came to any more conclusions from the wordclouds I made than from just looking at the frequency tables. It was easier to see the difference in frequencies within one article given the differing size of the text, but they didn’t help much to compare between articles.
Finally I made the tf_idf dataframe.
Some words I found with the highest tf_idf value were “forest”, “agriculture”, “education”, and “organic”
Next is the articles for the south west region.
I started by reading in all of the articles I chose. This region only had 4 states in it, so I chose 4 articles from each state, and for each state I used the same newspaper. The states included in this region are Texas, Oklahoma, Arizona, and New Mexico.
Next I made frequency tables of each article, excluding the stop words.
Most of these articles had high frequency words about the economy, and what seemed to economic impacts of climate change. Only a few had words about politics and government, emissions and chemicals, or weather.
Then I used afinn, nrc, and bing to get sentiment values for each word in the frequency tables.
## # A tibble: 2,477 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # … with 2,467 more rows
## # A tibble: 13,875 × 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # … with 13,865 more rows
## # A tibble: 6,786 × 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # … with 6,776 more rows
##
## negative positive
## 56 40
##
## negative positive
## 78 56
##
## negative positive
## 39 18
##
## negative positive
## 21 19
##
## anger anticipation disgust fear joy negative
## 25 46 14 31 31 63
## positive sadness surprise trust
## 107 16 22 70
##
## anger anticipation disgust fear joy negative
## 22 49 20 31 32 69
## positive sadness surprise trust
## 101 32 23 64
##
## anger anticipation disgust fear joy negative
## 18 25 9 23 12 42
## positive sadness surprise trust
## 51 16 13 35
##
## anger anticipation disgust fear joy negative
## 7 9 6 13 7 25
## positive sadness surprise trust
## 29 9 3 12
##
## -3 -2 -1 1 2 3 4
## 7 19 15 19 16 2 1
##
## -3 -2 -1 1 2 3 4
## 10 28 16 25 23 8 3
##
## -3 -2 -1 1 2 4
## 8 15 14 9 7 1
##
## -3 -2 -1 1 2 3 4
## 2 7 5 7 5 1 1
##
## negative positive
## 23 22
##
## negative positive
## 13 9
##
## negative positive
## 14 7
##
## negative positive
## 20 14
##
## anger anticipation disgust fear joy negative
## 8 19 2 11 12 24
## positive sadness surprise trust
## 55 7 7 36
##
## anger anticipation disgust fear joy negative
## 7 14 2 5 12 17
## positive sadness surprise trust
## 46 2 1 34
##
## anger anticipation disgust fear joy negative
## 5 9 4 15 8 19
## positive sadness surprise trust
## 24 5 5 23
##
## anger anticipation disgust fear joy negative
## 13 20 9 15 12 27
## positive sadness surprise trust
## 34 10 11 31
##
## -4 -3 -2 -1 1 2 3
## 1 6 6 7 14 9 2
##
## -3 -2 -1 1 2 3
## 2 5 4 11 7 1
##
## -3 -2 -1 1 2
## 1 4 2 7 5
##
## -3 -2 -1 1 2
## 2 6 3 5 8
##
## negative positive
## 23 10
##
## negative positive
## 20 18
##
## negative positive
## 5 5
##
## negative positive
## 18 10
##
## anger anticipation disgust fear joy negative
## 6 10 2 9 6 23
## positive sadness surprise trust
## 27 7 4 14
##
## anger anticipation disgust fear joy negative
## 10 16 2 13 11 23
## positive sadness surprise trust
## 43 6 6 23
##
## anger anticipation disgust fear joy negative
## 4 9 4 5 3 8
## positive sadness surprise trust
## 24 2 2 20
##
## anger anticipation disgust fear joy negative
## 7 9 3 7 7 15
## positive sadness surprise trust
## 29 3 4 19
##
## -3 -2 -1 1 2 3 4
## 2 5 1 7 4 2 2
##
## -3 -2 -1 1 2
## 1 7 6 7 9
##
## -3 -2 -1 1 2
## 1 1 4 6 5
##
## -3 -2 -1 1 2 3
## 2 2 6 10 3 1
##
## negative positive
## 20 18
##
## negative positive
## 10 3
##
## negative positive
## 20 20
##
## negative positive
## 29 13
##
## anger anticipation disgust fear joy negative
## 10 23 5 15 13 27
## positive sadness surprise trust
## 49 11 9 28
##
## anger anticipation disgust fear joy negative
## 6 9 3 6 9 13
## positive sadness surprise trust
## 23 4 6 13
##
## anger anticipation disgust fear joy negative
## 10 12 7 19 11 31
## positive sadness surprise trust
## 34 13 4 19
##
## anger anticipation disgust fear joy negative
## 13 13 9 23 8 35
## positive sadness surprise trust
## 31 21 8 18
##
## -3 -2 -1 1 2 3
## 2 7 6 8 7 2
##
## -3 -2 -1 1 2
## 1 6 5 5 4
##
## -3 -2 -1 1 2 3
## 2 10 4 7 13 1
##
## -4 -3 -2 -1 1 2
## 1 10 12 6 8 9
For these tables, I wanted to compare the trends within each state. In the afinn tables, I found Texas to be very right skewed, with much more values in the negative numbers than positive. Oklahoma was the only state with an article that had more positive numbers than negative, and the Arizona and New Mexico ones followed a normal distribution more or less.
The bing tables showed about the same thing, except there was no article with more postive values than negative, so it is surpising that the one from Oklahoma as mentioned before had higher afinn scores.
Like the histograms, I’m not sure how much these helped me, as it wasn’t easy to compare each wordcloud to the other ones. But I could see any words in each article that appeared many more times than others, since they were printed in a bigger size. It was easier to understand the scale of each word with the wordcloud than the frequency table.
Finally I made the tf_idf dataframe.
Some words I found with the highest tf_idf values were “water”, “plant”, “energy”, and “oil”. I think this is an interesting difference between words like “education” and “agriculture” from the midwest region I looked at. It seems like the south region is more concerned with the root of the problem and possible solutions, where the midwest region is concerned with affects climate change could have.